exploratory data analysis and models on the epi dataset

date: 2025-10-13

dataset and choices

1) variable distributions

1.1 boxplots and histograms (with density!)

1.2 qq plot (two-sample)

2) linear models

full: EPI.new ~ gdp

full: EPI.new ~ gdp + population

2.2 same models on one region (comparison)

on region Sub-Saharan Africa, the better model is region Sub-Saharan Africa: EPI.new ~ gdp + population (r²=0.361, aic=265.4, bic=272.7).

3) classification (knn, label = region)

model A

model B